dir

Winsock Stuff

Welcome to the Wonderful Wacky World of the World Wide Web Winsock as World Wide Web Development (WWWDev) take a Whack at trying to explain some of its finer points.
Please bear with me as I will try to start from the ground up.
This is NOT a comprehensive document by any means, but only an attempt to explain how to use the Winsock interface from Visual Basic.
This document assumes you are a Visual Basic programmer who wants to write software which talks to the Internet. You have been attempting to make sense of the existing documentation, and are, at this point, marginally suicidal.
You will need, unfortunately, the Winsock specification, available at a number of download sites. (Always in the wrong format, usually Postscript or Word, as if most of us programmers have the money and inclination to purchase bloated word processors).
The specification itself is aimed at C programmers, and has the usual arcane terms and code examples.
There are also a number of expensive books out there, but they seem to get to a certain point and then it's as if the author had a deadline or lost interest and stopped. Or they are coming from a Unix/C perspective instead of from VB.
OK, enough of that. Here we go. Again, please bear with me as there is a certain amount of background to be covered in order to convert from the C world to Visual Basic.

Winsock
Winsock is a application programming interface (API) for the windows environment. It is a modification of an earlier API for the Unix environment and actually is an admirable job of trying to fit a square peg into a round hole.
All of the references to 'Blocking', 'Non-blocking', WSAAsync..., etc. come from this conversion process.
Basically, the Unix operating system does not allow application programmers (Application programs are programs that ride on top of the built-in operating system and are usually designed to 'do' something. Systems programs are written to provide underlying services to Application programs. Windows is a system program. A program you might write, an Employee database , for example, is an Application.) to operate in something called Interrupt Mode. They must do everything in Polled Mode.

Before we get to whatever the heck that means, a word about where I'm coming from: I've got ove 20 years programming experience. Cyber systems to DEC minis to PC's. Machine language, Assembler, C, C++, .....etc. DOS, OS, Unix......
So I guess I can poke a little fun at some institutional ideas and concepts. You know, like Unix being the e.e.cummings of operating systems.

Interrupt vs polling and blocking vs non-blocking:
In the systems world programmers have to work with very fast CPU's talking to very slow devices. An example is transmitting data over a phone line. You always have to send a character and then wait until the device can accept another.
If you use polled mode you will have a transmit loop that sends, loops (waits) until the device is ready, and then sends the next. It's this waiting that causes a problem. You are tying up CPU time which could be used for something else, like maybe another transmit routine. So the hardware guys came up with something called Interrupts. Now you simply arm the device (telephone interface) to send an interrupt whenever it is ready for another character, and then go off and do something else. The device needs a character and sends an interrupt. When the interrupt occurs, the interrupt system hardware interrupts your program, sets bookmarks so you can get back, and starts running your interrupt routine. This is a separate module which handles the feeding of another character to the device and then tells the hardware it is finished. The hardware uses the bookmarks to resume your main program at the point it was interrupted. Elegant no?
Unfortunately the applications guy in Unix can't do this (Unix?, the end-all be-all?). Instead he has to poll. Of course, Unix being multi-tasking, lets him merely write another routine for the othe process...and another...and another.
Now in Windows (poor old slow Windows) there is a thing called messages. Guess what? These are like interrupts to your main program. Now all you have to do is tell your device handler to send a message to your message handler and the start your main routine. Whenever the device guy sends a message, Windows stops your main and throws you into the message routine (an Event!). Your message routine (you know, like a click event) processes the message and when it terminates control goes back to your main at the point it was interrupted.
That's why Winsock has all these add-ons to the original Berkeley Sockets stuff to get from polled to interrupt, being that polled in Windows has some very bad side effects, mainly performance problems.

Blocking vs non-blocking
Basically, the non-blocking calls are calls that will cause a message (interrupt) at some later time, and blocking calls are those that do something and then poll until they get an answer and THEN they return.
An example are the DNS name calls. These are used to get an Internet address which corresponds to a Domain name, or vice-versa. Out on the net are DNS name servers whose job in life is to convert from one to the other. If you use the blocking call to convert a Domain name to an Internet address (more on this later) then Winsock calls out on the network to the Domain Name System to get the matching Internet address. This takes time, especially if no match is found. Winsock will not return to you until it gets the response or times out. So your program acts like it is "hung". In VB you are supposed to use the DoEvents call to return control to Windows. There is a similar call in C. Apparently most Winsock programmers forgot to do this in their polling loop, another reason why your system seems to 'hang'. This is supposed to go away in Win95, which doesn't need DoEvents as it is supposed to interrupt any program periodically to check other programs.
Better to use the non-blocking call. Winsock starts the name lookup and returns to your program. Later, on name resolution (don't you love this techie terminology?) or time out, Winsock will "post" (send) a message to your message routine. Your main program is interrupted wherever it is, and the message event routine is started. It gets the message parameters, does whatever you want (like updating an array of values relating to that name/address) and then terminates. Your main resumes where it was interrupted as if nothing happened.
Congratulations, you now know from blocking and non-blocking calls. The non-blocking (message using) calls are preceeded with the term 'WSAAsync'. (Async is another painful way of trying to describe interrupt activity.).

Addresses, structures, pointers etc.:
Now we get to C. (it was inevitable). C was designed as a way to write system programs that could be "portable", that is the same source code could be compiled on different computers and the resulting executable code would magically work. A GREAT idea, and it sometimes almost is true. Then applications programmers (actually schools started teaching it) discovered that they could also write their stuff in C, if only a leetle teeney bit was added. C mushroomed. And now every body thinks that if it ain't written in C it ain't any good.
My own take is, portability not withstanding, if you need systems performance, write it in assembler. Ever look at a reverse assembled C program? And, if you want to develop an application program (one that people interact with somehow, AND is usually written to make money...something all those college profs have never thought about) then use the HIGHEST level language you can use.
Think about it, what is the definition of the BEST program? I like to think of it as the one that is developed the LEAST amount of time, that WORKS, and, VERY important, is MAINTAINABLE. Period. Like it or not programming is a way to create something that ultimately generates bucks. (Most programming is commercial in nature, not recreational).Please don't respond to differ...this is just my own personal opinon.

C
OK, so now I got to bash C, now lets see some details:
C was an Assembly language substitute designed for portability. Portability requires the code syntax to somehow not be concerned with the underlying machine specifics......like memory addressing, because if you are going to 'port' the code to a different computer then, in all likleyhood, its hardare characteristics are different than any other machine. (Now you know why the PC/Intel x86 archetecture is so popular, the Hardware is back compatible. Also now you see why non-Intel stuff like Power PC's and Mac's don't have all this software written in C for Intel platfiorms. Write it in C and it's portable...yeah right.)

Memory
Computers have memory. Memory is a huge pile of buckets (locations), each with a unique numerical address, just like houses. These buckets contain other numbers (this is because computers are pretty stupid, they can only add and shift numbers.) which is data or, guess what?...addresses. Huh? Let's do that again. Any computer memory location contains a number. This number can be data (number of employees), a DIRECT address (the address of the bucket that contains the number of employees) or an INDIRECT address (the address of the address of the bucket with the number of employees. And, this nested indirect stuff can go on forever, depending on the twists and turns of the programmer's mind who set up the program. Winsock does have some double and triple nested indirect addresses.
This is what C guys are trying to say when they throw out the words "cast". They are trying to eventually get to the number of employees, as we all are. Ever read a C primer? They are full of little diagrams of eyeballs peeking thru fingers at computer memory locations, or of little rulers and sticks with memory locations stacked up. All trying to explain direct and indirect memory addressing. And no wonder, C is to isolate the programmer from the underlying hardware and then they have to explain......the underlying hardware. First time I ever read K&R (the C bible, written by the guys that invented it) they actually bragged that they had NEVER written anything in Assembler. I almost fell out of my chair.
Anyway, Winsock is written in C and messes with data in C format. The documentation shows data structures, passed parameters, etc. in C format. To use Winsock we have to translate that into Visual Basic terms. Also, we have another problem. If C was used to isolate the programmer from the hardware, guess what an even higher language, like VB, does? It REALLY isolates you from the hardware and goes to great lengths to NOT let you mess with memory addressing. VB doesn't even have PEEK and POKE. (Although I must admit, this is one way for a C guy to retaliate....words like PEEK and POKE? It's embarrasing). Luckily, Windows has a ton of DLL's which you can call from VB. There is everything there needed to talk to Winsock...that's why we're here.
So now we gotta call Winsock to do something. In most cases we have to provide parameters and, after the call, we have to retrieve them. Unortunatlely Winsock thinks it is talking to a C program and expects things in a format which VB can't easily do.
(You know, a big reason I got started with this VB/Winsock thing was after reading an Email from some Unix/C weenie who was pontificating to a beginning VB programmer who had the temerity to ask the Big Kahuna if it was possible to use Winsock directly from VB. His Ceeness replied that "due to the inherent limitations of VB it is impossible". I hope he reads this.

And now for an advertisement: checkout www.wwwdev.com for VBServer. It's a Web server written entirely in Visual Basic and uses Winsock DLL calls only. No VBX's to get to Winsock. They definately are not needed and VBServer works just fine.

GetHostByName, an example.
GetHostByName is a Winsock call that gets a Domain name in 'char' format (a string) and returns a 32 bit (VB long&) Internet address associated with the passed Domain name.
First we have to get around the indirect addressing stuff, that is translate that C 'far', 'pointer', '*' stuff into English and then into VB.
OK class, get out your books which have the Winsock call definitions. Let's look at 'gethostbyname'. C calls are functions, they return a value (Well, VOIDs are functions which don't return anything which is a sub but we don't have to deal with that). The call format is X=somecall(A,B,C). You set up the parameters A,B, and C, call 'somefunction, and X gets set with the return value. Pretty simple. Except that the function may expect A to be the A data, B the address of the B data, and C to be the address of the address of the C data. And maybe X is actually the address of the address of....Get the picture?
Now for GetHostByName. Ignore the return value for now, that's a structure and is a later subject.

Strings
The first parameter says 'const char FAR* name'. That's Sanskrit for the fact that the parameter passed is the address of a string. Strings are different in C and VB. VB stores a description of the string (like length). When VB gets a string it knows how many characters to get because the descriptor tells it how long the string is. C strings are null terminated, that is the last character is followed by a byte whose value is zero. Boy does this cause grief for C programmers if they mess up. If you are processing a string in C and have forgotten the terminating NULL you will run right past the end and will keep getting whatever data is in memory until you finally encounter a null byte.(Like beucoup K later).

String conversion from C to VB
Luckily, (actually by design), VB will convert its strings to NULL terminated when you call a DLL so you don't have to worry about fooling with VB string descriptors and null-terminated strings. But you DO have to worry on some calls that return strings or addresses of strings because ,remember, Winsock thinks it is talking to a C program so will return C-type references (pointers or memory addresses) to C strings. By the way C-talk for addresses is 'pointer'.

GetHostByName string parameters.
Back to 'const char FAR* name'.
'const'
The details of the string parameter are 'const' which is ...who knows? I forgot, or maybe never knew. constant? Look at the call gethostname, it doesn't have 'const', yet the parameter passed is IDENTICAL. Get used to it, C is full of this type of stuff. Anyway, we ignore it.
'char'
Next is 'char' which means the variable is a bunch of characters (a C string terminated by a null). We are going to pass a VB string and let VB convert it C format.
'FAR*'
Then we get 'FAR*' which says the parameter is an address ('*') pointing to the data and that the address is in 'FAR' format. Without going into a lot of detail, a FAR address is 32 bits long in a PC (VB long& data type). In assembler we use two 16 bit values for a 32 bit address, segment and offset. Any address in a PC can be described this way. A group of 16 bit addresses from 0 to all ones is 64K. Thus the total address space you can reference using the SAME segment number is 64K. Sound familiar? Yep, it's the Intel 64k segment 'limitation'. Well, not really a limitation, just change the segment number and get more. This is how mainfarame/mini memory management works. You could design a new computer that uses more address bits and see more than 64K, but then all those memory address paths have to change, and finally, how many bits? No matter how many you use you'll be accused at a later date of creating a memory segment 'limitation'. (Just like now with the 68xxx stuff used in Mac's, which use this 'flat' memory model.) Anyway, in Winsock (PC/Intel) all addresses are 32 bits or FAR and thus fit into a VB type long& variable.
'name'
Finally 'name' stands for the name of the variable you will provide.

So a call to GetHostByName in VB looks like:
hostent&=gethostbyname(my_name$).

Structures
Now what about hostent, the variable that gets set by Winsock on return from the call? This is where I spent a LOT of time, trying to figure out what all those structures in the spec were for and how they really looked in memory. Once you figure this out you've got it. Oh, yeah, structures. VB calls them 'user defined types'.
What you are really doing is describing a chunk of memory........
I got this here chunk and the first word (16 bits)is an integer%, the next 2 words (16 bits * 2 = 32 bits) is a long&, and the last 20 bytes (8 bits each) are characters.

Byte data
VB will not let you direcly access bytes (PEEK and POKE, INP and OUT used to.) But there is a technique which gets around this, although a little clumsy. (Why can't Microsoft give us a byte variable?). What you do is define a fixed string of x bytes long. Now VB will use these bytes as characters. But to get the actual number (data) in the character position we have to use the VB ASC() function to get it. So, for example, I use, on byte-oriented operation, a fixed-string variable that I define as : abyte as string*1. Then, if for example I read a binary file, one byte (GET #1,,abyte), and the data in the file at that byte position was, say, 99 decimal, the assignment somevariable%=asc(abyte), then somevariable% is now set to 99. Cool....but clumsy.
The above example type defining the chunk of memory is:

type my_type
    an_integer as integer
    a_long as long
    twenny_bytes as string*20
end type
C does the same thing. The trick is to describe the memory layout using the VB syntax and then set the values and then call Winsock, providing the address of the structure(type).
Returned structures (like hostent),however, require a little more work, depending on the Winsock call.

Winsock returned structures (VB types)
Some Winsock calls let you PASS a structure that gets filled in by Winsock. These are easy, as you tell Winsock where to put the returned data by passing it the address of your structure in the call. The Winsock call 'connect' does this.
But GetHostByName does things a little differently. Rember that the returned parameter was something called 'hostent&', a VB long& value. This is the address of the hostent structure SOMEWHERE in Winsock's own memory space. What Winsock did was get the data and put it into a chunk of Winsock's memory which is mapped like the hostent structure and then returned the address of the chunk's (structure's) first memory location. Frustrating, we can't get to it because when we DIMensioned OUR hostent structure, VB used VB's memory space. Nooo problem, just copy from the Winsock structure to the VB structure. How many bytes? Well how about using the VB LEN() function on hostent? Use len(hostent) and you don't have to manually count how many bytes. Once the data is copied, we can now get to it using our VB structure tags.
Aaaaand Another reason to copy....the spec indicates that because Winsock is busy doing its thing it may realocate the chunk of memory (reuse it), so if we want the data we better copy it fast. Also, I've had experience with the dreaded GPF which may have come from me trying to mess with memory either outside mine and protected or maybe Winsock was trying to change it when I was accessing it. At any rate, copy it first thing.
ALSO, more details. Remember that Winsock uses these structures to pass data and ADDRESSES of data (or addresses of addresses of data). Where do you think these are? You got it, in Winsock's memory area again. After a while you may figure out (there is a vague reference to this in the spec) that Winsock tucks in the actual data right after the structure it allocates. This means to be safe you need to define extra memory locations right AFTER your structure. Then, when you do the copy the length will include the extra stuff and it'll sit there in VB's memory space right after the VB type, and the VB pointers will point to it. (Please, those of you who know about Relative vs Absolute memory addresses be quiet.)
OK, now we know how to get the Winsock structure data into out VB type, now we can access the data.......well, almost.

BYVal and BYReference
Remember memory buckets? The memory location can hold data, an address, an address of an addresss...ad nauseum. Well, VB is pretty simple and basically uses either data or address of data. Thus we need to be able to define VB data as to what it is and VB only talks about two types: data and address of data. When we call DLL's we need to know if the DLL wants a data type or an address of data type. This is where BYVAL comes in. BYVAL indicates that it is the data method. The absence of BYVAL is the address of the data method. That was easy. (A special exception is string data. BYVal is used for something entirely (kinda) different. BYVAL forces VB to PASS the specified string as the address of a NULL terminated C string. How do we convert this VB string to a C string? We don't, VB does it automatically. You must always use BYVAL, as you can see in the calls in VB format.)
So, if the Winsock function has '*' then it wants or gives a pointer (address), if not then it wants or gives data. If data is used then we use BYVal.

hostent
Now back to structures and indirect addressing. Check out the hostent structure in C syntax:

struc hostent{
    char FAR* h_name
    char FAR* FAR* h_aliases
    short h_addrtype
    short h_length
    char FAR*FAR* h_addr_list
};
h_name is a null terminated string. Remember, it is NOT the string, but the address of the string, and it'a a FAR address so it's 32 bits, which in VB is a LONG&.
Let's skip to 'short'. This is one way of describing an integer value in a PC in C. It's not the only way and you will see others. Anyway, h_addrtype and h_length are VB integers.
Now for the fun. The FAR* FAR* 's are...that's right an address of an address of the data. And VB can't be told to get this. That's ok though, we have ways, but first more detail (I know, I know, but it beats the h--- out of digging ditches). But let's do this graphically (I used to teach so I always dig a blackboard).
Let's pretend we have a very small computer. 10 memory locations. In location 9 is the data, say the number 73. In location 3 we put the address (9) of the data. Then, just to confuse the heck out everyone we'll put the address of the address(3) into location 7. All unused locations contain 0 (don't ever count on this in the real world).
                1 contains 0
                2 contains 0
     FAR*FAR*   3 contains 9 ----> points to location 9
                4 contains 0
                5 contains 0
                6 contains 9        
 FAR*FAR*FAR*   7 contains 3 ----> points to location 3
                8 contains 0
     FAR*       9 contains 73 <--- the data        
                10 contains 0      
Our data is 73. But we can refer to it by any of the associated addresses. If we say that our pointer is FAR*FAR*FAR* and that it is a 7 then we are saying that our pointer (7) points to another pointer (3) which points to the data (contents of 9). OK, so what? Well, if we can only get to data in VB directly, that is by using it's address then somehow we have to get to the number 9 which is the address of the data. Hmmm, if we had PEEK we could PEEK in 7. We would get 3. Then we PEEK in 3 and we get 9.....the address of the data! Oh, yeah, we don't have PEEK. But we do have (drumroll) hmemcpy, which, of all things is a C call in the Windows DLL (Cymbal).

hmemcpy
hmemcpy is a nifty little call that copies a block of memory from one location to another. You just specify the 'from' address, the 'to' address, and how many bytes. (Technical talk for the 'from' or 'source' address is 'Goesoutta' and the destination or 'to' address is 'Goesinta').
Two tricks here...pay attention. One, the pointers are all FAR or 32 bits which is 4 bytes. So our memory blocks for holding FAR addresses are 4 bytes each. Two, and VERY important, we can force the way we pass the parameters to hmemcpy. What? Easy, make the destination by reference and the source byvalue (ByVal). Instant PEEK without the stigma of a silly word. In fact C guys will be impressed...."Oh yeah, (sigh) I use hmemcpy"...wow.(ALWAYS speak in lower case). Now to use gethostbyname you define the above C strucure in VB type lingo as:

Type hostent_type     
    h_name As Long    
    h_aliases As Long 
    h_addrtype As Integer
    h_length As Integer
    h_addr_list As Long
End Type
Global hostent as hostent_type
hostent.h_addr_list points to...a list of addresses for this host. Don't worry, just take the first in the list. Do it like this:
Below is a code fragment from VBServer. Notice that I copy the passed Winsock C hostent structure into, or on top of my VB allocated hostent structure. Winsock builds his structure in his memory space. We need to get to the various fields (pieces) so if we copy it into our space we can use our field names. So first we call gethostbyname. Winsock then sets the return value as a 32 bit address (pointer) to his structure in his memory space. We copy from his space to our VB type (structure) in our space. Then we start the hmemcpy stuff to get at the list. When we are done, list& points to the first data in the list, which is the Internet address of the passed Domain name. NOTE, this address is an INTERNET address and NOT a memory address. Well hey, there's only so many ways to say address.
'get host address by name
'returns address of
'winsock hostent structure
ha& = gethostbyname(hostname$)
'copy winsock structure to vbserve structure
hmemcpy hostent.h_name, ByVal ha&, Len(hostent)
'get address of list
listaa& = hostent.h_addr_list
hmemcpy lista&, ByVal listaa&, 4
'get first list entry
hmemcpy internet_address&, ByVal lista&, 4
Trust me, it works. And that is basically that.
There is one other Windows DLL call that needs to be used. The last Winsock call we did was gethostbyname. We passed a Domain name (string) and Winsock gave us a pointer to a structure which among other things, contains a pointer to the associated Internet address. We then went thru a lot of gyrations to get the 32 bit Internet Address.

Dot Addresses
Usually, people don't want to read the Internet address as a 32 bit number because it's too big, and portions of mean different things. Once we have the 32 bit Internet Address we can translate it into 'dot address' format, which is what humans like to see. There is a Winsock function which does this translation although it really could be computed in code but why do it when Winsock will?
The usage format is:
inet_ntoa
This call will convert a passed 32 bit Internet Address into a string containing the 'dot address'. Sounds simple? It is, but if you have been peeking at the spec you noticed that the call passes a structure. Further, if you went to the structure definitions you saw structure in_addr defined in some really strange looking C code with the word 'UNION' buried in it:

struct in_addr{
    union{
        struct {u_char s_b1,s_b2,s_b3,s_b4;}S_un_b;
        struct {u_short s_w1,s_w2;}S_un_w;
        struct {u_long S_addr;
    }S_un;
    etc.,etc., and etc.
Whatever in the world is that? Hey, C is sooo much more user-friendly than Assembler. Let's see if the VB equivalent makes more sense:
type in_addr_as_4bytes
    addr_byte1 as string * 1
    addr_byte2 as string * 1
    addr_byte3 as string * 1
    addr_byte4 as string * 1
end type
type in_addr_as_2words
    addr_word1 as integer 
    addr_word2 as integer
end type
type in_addr_as_1long
    addr_long as long
end type
Ah-ha! This is only a way to get to the bytes in a long&, or the words% (integers) in a long&, or the long& itself. Why? I dunno. There is no code that I can find that uses it. So our call is really looking for a long&:
char FAR* PASCAL FAR inet_ntoa (struct in_addr in);
is:
Declare Function inet_ntoa Lib "winsock.dll" (ByVal iaddr As Long) As Long

and gets used as:
dota& = inet_ntoa&(internet_addr&)          'internet_addr& is our 32 bit Internet
                                            'address from the last call.
'lstrcpy needs a blank VB target string    'later, later
dotaddr$ = Space$(256)
temp& = lstrcpy&(dotaddr$, dota&)
'Get rid of nulls copied from Winsock, you gotta write this. It just
'uses a for loop to scan the string byte by byte and replaces if it gets a match.
server_dotaddr$ = replacechar(server_dotaddr$, Chr$(0), " ")
'And trim it
server_dotaddr$ = Trim$(server_dotaddr$)
Whoa, what happened? Well, we got a returned value 'dota&' which is a 32 bit pointer to a C string which contains the dot address. We want to copy it into a VB string. We FIRST set the VB string to 256 spaces so lpstrcpy has a bucket to put the C string into and then call it. Then, we get rid of any NULL's copied. And then we trim it. What the heck is lstrcpy? A good way to copy C strings into VB strings, that's what.

lstrcpy
Wow, another C function in the Windows DLL. What ammo for the next beer-bust with a bunch of C guys. This function copies a C string from a 32 bit address into a VB string. Well, into space ALLOCATED for a VB string. This includes everything, including the terminating NULL, so we have to get rid of that.


There are, obviously, a lot more calls in Winsock but after this hopefully you get the idea. You can now get started. VBrowser source code contains all the Winsock function and type declarations for a Web Browser, plus actual Winsock calls with plenty of documentation. Look at it as a working text-book. I did not put any HTML parsing into the code as vanilla VB does have limitations as to how and what you can display at one time. There are special tools out there that will allow you to build the required forms to mix fonts, colors, images, etc.


Last Updated: Monday, October 23, 1995